| Florian Cramer on Wed, 19 Dec 2007 03:24:50 +0100 (CET) |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
| <nettime> Critique of the "Semantic Web" |
[This is a lecture manuscript written for the "Quaero Forum" on the
politics and culture of search engines at Jan van Eyck Academy
Maastricht, 9/2007 - it's still a bit rough; thanks to Felix Stalder for
useful corrections and his suggestion to post it here. -F]
Animals that Belong to the Emperor
Failing universal classification schemes from Aristotle to the Semantic
Web
Quaero Forum, Maastricht
Florian Cramer
29/9/2007
The weapon with which state-subsidized European search technology
projects allegedly intend to beat Google is semantic information
processing: pattern recognition in media file in the French Quaero
project, Semantic Web technology in Theseus, its German off-spring.
Originally, Quaero was a French-German collaboration, funded by both
governments, until the German Theseus project split off from Quaero to
pursue its own vision of future Web search. This vision is twofold,
involving a number of classic holy grails of computer science:
1. to provide search on the basis of Semantic Web meta tags,
2. to have software recognize the contents of web pages in order to
automatically apply those tags.
While the second point is utopian enough and something that Artificial
Intelligence research failed to achieve for decades, even the first
point, the universal nomenclature of semantic tagging known as the
Semantic Web, is doomed to fail by any critical standard of cultural
reflection. The reason why the Theseus project nevertheless receives
high public funding is economic and political, but, with its stated
goals, hardly related to anything resembling a working web search
engine.
Founded and pursued by Tim Berners-Lee, the original architect of the
World Wide Web, the "Semantic Web" is a term and project that is not
only prone to major confusion, but also emblematic of how the
alienation between engineering and humanities goes both ways:
shockingly naive and simplistic understandings of cultural concepts
among the former, and a complete misunderstanding of the "Semantic Web"
among the latter because its terminology of "semantics" and
"ontologies" is plainly weird or mystifying outside computer science.
In 2004, prior to Quaero and Theseus, the German federal government
subsidized research on the Semantic Web with 13.7 million Euro,
reasoning that as a "semantic technology", it would allow people to
phrase search terms as normal questions, thus giving computer
illiterates easier access to the Internet. But the Semantic Web is
actually not about this at all; the funding was, in another words, a
13.7 million Euro misunderstanding. {1}
Natural language question parsing indeed is another holy grail of
Artificial Intelligence research, parodied by Weizenbaum's "Eliza", and
tried by Web search engines from "Ask Jeeves" - which renamed itself
Ask.com after deemphasizing its original concept - to "Powerset",
recently brought up by Geert Lovink on the Nettime mailing list.{2}
Full semantic natural language understanding falls into the previously
mentioned second category, the nut that "hard" A.I. research has
claimed over decades to have almost, but just not quite cracked, while
critical A.I. researchers like Luc Steels claim that it cannot be
reached with current computer architectures regardless their speed. In
search engine reality, natural language search systems boil down to
nothing more but inefficient interface wrappers around Boolean search
expressions with their logical AND, OR and NOT operators.
The Semantic Web does not fall into this trap because it does not
involve any automatic interpretation of meaning. Instead, Berners-Lee
insists that his project "does not imply some magical artificial
intelligence which allows machines to comprehend human mumblings"{3}
- in sharp contradiction to the stated goal of the Theseus project.
Instead, he conceives of the Semantic Web as a universal, unified
markup or "meta tagging" system: "Instead of asking machines to
understand people's language, it involves asking people to make the
extra effort".
This effort, semantic tagging, is a well-established and popular device
on sites like the photo sharing platform flickr.com, the news
aggregator digg.com and the bookmarking site del.icio.us. It simply
means that users attach keywords to texts, images and other resources,
making the information searchable by keywords or particular keyword
combinations. On Flickr, for example, the search keyword combination
"birthday", "children" and "clown" results in a list of pictures of
clowns appearing at children's birthday parties - not because of any
Quaero-style computer recognition of the image contents and
Theseus-style automatic keyword mapping, but because the keywords had
been manually assigned to these images by Flickr users.
While such manual tagging also lies at the heart of the Semantic Web,
systems like those of flickr, digg and deli.icio.us are nevertheless
flawed from its perspective because they involve no unified standard or
nomenclature for tagging. If, for example, a user tagged an image with
the word "kids" instead of "children", it will not turn up in the
search result. On top of that, the tags lack abstraction and
universality: children for example could be classified as a subset of
humans, humans as a subset of mammals; birthdays as a subset of
celebrations etc. With such a classification, pictures marked up with
"birthday" and "children" could also be found in a more general search
for pictures of human celebrations. For this reason, unsystematic,
ad-hoc, user-generated and site-specific tagging systems like those on
Flickr are referred to as "folksonomies".{4}
The Semantic Web promises to overcome folksonomies with one, unified
and standardized keyword tagging system that can applied to anything.
In other words, it is a universal classificatory description system
and grand unified hierarchical meta tag tree. In line with computer
science terminology, but sounding mysterious and idiosyncratic anyone
else, Berners-Lee calls this classificatory system an "ontology",
making the project particularly confusing for people with backgrounds
in philosophy and humanities - because what he and computer science
call "ontology" is, outside such jargon and in a more common sense
language, not an ontology, but a cosmology.
Just as cosmologies are by no means new, so are universal
classification and tagging systems of all things in the world. In his
essay and short-story "The Analytical Language of John Wilkins", Jorge
Luis Borges writes about the English 17th century scholar that
"He divided the universe in forty categories or classes, these being
further subdivided into differences, which was then subdivided into
species. He assigned to each class a monosyllable of two letters; to
each difference, a consonant; to each species, a vowel. For example:
de, which means an element; deb, the first of the elements, fire;
deba, a part of the element fire, a flame." [...]
Similar classification schemes have been designed throughout the Middle
Ages and Renaissance among others by Ramon Llull, Giordano Bruno, the
encyclopedist Johann Heinrich Alsted and the theosophist Jan Amos
Comenius, scholars in whose tradition Wilkins, a founding member of the
"Invisible College", works and thinks. Before Diderot's and
d'Alembert's revolutionary, heretic device of arbitrarily structuring
human knowledge by the alphabet, encyclopedias has developed
increasingly complex tree-like classification systems of all things in
the world they described.{5} The cosmology-called-ontology of the
Semantic Web is not only similar, but precisely the same.
Medieval and Renaissance classificatory cosmologies could only work on
the basis of a stable assumption of what the world is and how it is
structured: for example, by the four directions, the four seasons, the
four temperaments, the seven virtues and seven vices, etc. They were,
in other words, still embedded into the paradigm of Medieval scholastic
science that in turn had been derived from Aristotle's system of
categories and its classification of beings into genres and species.
The Semantic Web is, bluntly said, nothing else but technocratic
neo-scholasticism based on a naive if not dangerous belief that the
world can be described according to a single and universally valid
viewpoint; in other words, a blatant example of cybernetic control
ideology and engineering blindness to ambiguity and cultural issues.
Although no Semantic Web existed in the 1940s, Borges' essay hits
the nail of the issue. One is tempted to replace the name John Wilkins
with Tim Berners-Lee when Borges reviews the former's categories and
finds that stones, for example, are absurdly classified as either
common, or modic, precious, transparent and insoluble, or that beauty
is assigned to a "living brood fish". He concludes that
"These ambiguities, redundancies and deficiencies remind us of those
which doctor Franz Kuhn attributes to a certain Chinese
encyclopaedia entitled 'Celestial Empire of benevolent Knowledge'.
In its remote pages it is written that the animals are divided into:
(a) belonging to the emperor, (b) embalmed, (c) tame, (d) sucking
pigs, (e) sirens, (f) fabulous, (g) stray dogs, (h) included in the
present classification, (i) frenzied, (j) innumerable, (k) drawn
with a very fine camelhair brush, (l) et cetera, (m) having just
broken the water pitcher, (n) that from a long way off look like
flies."
Although this is Borges' own fiction, it nevertheless reveals the
arbitrariness of categories and classifications. It also had a thorough
impact as a philosophical critique. Michel Foucault's "The Order of
Things" begins with a discussion of the above list of animals, which,
as he admitted elsewhere, "shattered all the familiar landmarks" of his
thought, opening his eyes on how the order of knowledge is culturally
constructed and may be conceived differently. To understand Foucault's
discourse theory, it practically suffices to read Borges' "Ficciones".
The order of things, and unified classification schemes, do not just
break down in fiction. Sticking to the example of animals, it is
obvious how Aristotelian philosophy continues to exist today, in the
notion of gender and species, and even more questionably in the
categorization of humans into biological races. But it does not even
even work in biology itself. The platypus, an Australian animal that is
a breastfeeding mammal, but it lays eggs, lives in the water and has a
beak like a bird, famously defies the classifications that historically
go back to Aristotle's "Zoology". If the platypus breaks genre and
species classification, where would it fit the Semantic Web?
In his book "Kant and the Platypus", Umberto Eco points out how the
animal marks the difference between scholastic and empirical
science.{6} A bit confusingly, he differentiates "cultural cases" -
that means categorically defined phenomena - from "empirical cases",
i.e. phenomena that are observed instead of predefined. "To be
recognized as such," Eco states, cultural cases "need reference to a
framework of cultural norms" (Eco 1997, p. 139). For Eco as a
semiotician, this means that Being, or existence, is the frontier that
systematic science cannot conquer - and this is what, in a
philosophical sense, ontology means.
The innovation of modern science since Galileo, Newton and Descartes is
that it operates without the reference to those norms. When Diderot and
d'Alembert abandoned the old classificatory order of knowledge in
encyclopedias and replaced them with a non-classificatory,
non-systematic alphabetic order, they precisely followed the empirical
paradigm, taking phenomena as they occurred and not as they fit. In
order to be a thoroughly critical investigation and abandon
preconceptions, science gave up "Semantic Web"-like schemes.
Returning to Internet folksonomies, a better example than the Platypus
was brought up in a Web forum of the German computer news site
heise.de. Discussing the Semantic Web and its classification scheme, an
anonymous poster brought up the hypothetical example "A Muslim is a
potential terrorist" in order to show that a unified semantic
"ontology"/cosmology cannot be built. This example scratches only the
surface of the pending cultural problems, since not the empirical cases
like the Platypus, but cultural ones bear the real dynamite. It sheds a
dubious light on computer linguists involved in the project if they
don't even seem to have done their homework on Saussure and the
arbitrariness, i.e. cultural dynamics, of the signifier in relation to
the signified. The Semantic Web, and any search engine or database
built upon it, rests on the illusion that an unambiguous assessment of
the world would be even theoretically possible. Beyond cosmology
falsely named ontology, it is metaphysics disguised as physics.
On a more practical (but nonetheless cultural) level, the Semantic Web
relies on a clean room illusion of a culture where semantic tags
wouldn't simply be used for spamming and search engine manipulation
which are already common enough for Google and other search engines to
ignore meta tags embedded into web pages. And while Berners-Lee is a
realist enough to state that meta tagging cannot be done by bots like
those dreamed up by the Theseus project, his Semantic Web implies a
complexity nightmare of meta information overtaking information, with
each piece of information creating at least twice as much work for its
semantic markup than for its creation proper, comparable to a library
whose the catalogs outnumber the books they reference.
"Semantics" and "ontology" are useful terms because they reference what
computers, as purely syntactical machines, cannot process, and which
can't be mapped into computer data structures except in subjective,
diverse, culturally controversial and folksonomic ways. The creators of
the so-called "Semantic Web" and "next-generation" search engines might
learn from Borges who concludes:
"I have registered the arbitrarities of Wilkins, [and] of the
unknown (or false) Chinese encyclopaedia writer [...]; it is clear
that there is no classification of the Universe not being arbitrary
and full of conjectures. The reason for this is very simple: we do
not know what thing the universe is."
__________________________________________________________________
Footnotes:
{1} User comment on heise.de: "Ich hab irgendwie den Eindruck dass
unser Bundesforschungsministerium in der irrigen Annahme ist, das 13
Millionen Euro eine Software schaffen die es jedem
Computer-Analphabeten ermöglicht, ganz ohne den `Extra Effort' seine
`Pisa-Versagen vermarkten und als hochinnovative Rettung des Wissens-
und Wirtschaftsstandorts Deutschland (wers glaubt ... ),
{2} Geert Lovink, search engines on the move, 19/9/2007,
http://www.nettime.org/Lists-Archives/nettime-l-0709/msg00028.html
{3} Quoted after: An interview with Tim Berners-Lee,
http://www.simple-talk.com/content/print.aspx?article=321
{4} "Folksonomy (also known as collaborative tagging , social
classification, social indexing, social tagging, and other names) is
the practice and method of collaboratively creating and managing tags
to annotate and categorize content. In contrast to traditional subject
indexing, metadata is not only generated by experts but also by
creators and consumers of the content. Usually, freely chosen keywords
are used instead of a controlled vocabulary", Wikipedia definition as
of 18/12/2007, http://en.wikipedia.org/w/index.php?title=Folksonomy
{5} As a remnant of this tradition, the Diderot/d'Alembert
encyclopedia still contains such a knowledge tree.
{6} Eco, Kant and the Platypus, 1997, p. 68
--
http://cramer.plaintext.cc:70
gopher://cramer.plaintext.cc
# distributed via <nettime>: no commercial use without permission
# <nettime> is a moderated mailing list for net criticism,
# collaborative text filtering and cultural politics of the nets
# more info: http://mail.kein.org/mailman/listinfo/nettime-l
# archive: http://www.nettime.org contact: nettime@kein.org